August 2009, EBE # 500 


REGIONAL E DUCA TIONAL LABORATORY 
SOUTHEAST ~ SERVECenter 


EVIDENCE BASED 
EDUCATION 
REQUEST DESK 


OUR GOAL 

To assist educators and 
policymakers in their 
efforts to apply the 
evidence base to 
decisions about policies, 
programs, and practices 
they encounter. 



REQUEST: 

What are some effective or well-developed systems to evaluate teacher effectiveness that 

include student-learning impact data? Please provide the pros and cons of such systems. 

We organized the response to this broad question into several sections. 

1) What are the various approaches to assessing student growth in achievement? There 
are various statistical approaches developed by various researchers, and though there 
is concern about what their relative strengths and weaknesses are in generating 
reliable and valid growth scores-, there is also concern about whether the estimates are 
an accurate measure of teacher effects/quality. For this reason, we also provide a list of 
possible experts on this issue and could provide the backgrounds/expertise for some of 
these experts if needed. 

2) What are the issues that have been raised around the validity of using value-added 
scores for high-stakes decisions about individual teachers? As this section concludes, 
there are issues that mitigate considering value-added scores for teachers as the sole 
measure of their effectiveness, especially with regard to high-stakes decisions. Experts 
advise using a broader definition of teacher effectiveness and the use of multiple 
measures for this purpose. 

3) What are some examples of the use of teacher-level, value-added models/ modeling 
(VAM) scores in states or districts? 


If you have any questions regarding this document, please contact the 
REL-SE, 1-800-755-3277 or RELSoutheast@serve.org 



SOUTHEAST 

Regional Educational Laboratory 







RESPONSE 


Since the passage of the No Child Left Behind Act of 2001 (2002), there has been increased 
interest in using student achievement data (through standardized tests) to evaluate teacher 
effectiveness. Two U.S. Department of Education secretaries, Secretary Spellings and Secretary 
Duncan, have expressed interest in growth models and the need to improve the way teacher 
performance is evaluated. At the same time, there has been interest in creating longitudinal data 
systems so that student growth can be more accurately measured. The combined push for 
improvements in student assessments, teacher evaluation, and longitudinal student databases has 
led to the current educational agenda that lauds those systems that link student achievement to 
individual teachers. The following sections describe four major value-added models (VAMs), 
concerns with VAMs as the sole measure of teacher effectiveness, some district and state 
examples, and other resources related to the issue of teacher effectiveness. 


Matrix of Growth Models 

Currently, one of the more prominent approaches to estimating the impact of teachers and 
schools on student achievement is value-added models/modeling (VAM), which is a subset of 
models commonly referred to as growth models. 1 Growth models stand in contrast to status 
models. Status models provide a snapshot of a school or subgroup performance (generally 
measured in proficiency levels) at a specific point in time, while growth models chart student 
learning gains/growth over time longitudinally (see Goldschmidt et al., 2005). This section 
provides a brief overview of some models currently employed. It is not intended as an exhaustive 
list of how all states or districts are assessing achievement growth. (A separate bibliography 
includes the references cited within the matrix/table.) 

Table 1 below presents four models and distinct methodological approaches to estimating 
VAMs. The Sanders model, known most widely for its applications in the Tennessee Value- 
Added Assessment System (TVAAS) and the SAS Education Value-Added Assessment System 
(EVAAS) in North Carolina, has the broadest use. Both Ohio and Pennsylvania have adapted 
versions of the Sanders methodology for use in their respective accountability systems. 2 As 
Table 1 indicates, the Sanders model has been widely discussed in the literature as noted by the 
lengthy list of citations. The model utilizes a multilevel "layered" approach that uses students as 
their own "controls." The RAND "model" is essentially an expansion of the Sanders model to 
include additional controls for students, teachers, and schools as well as examining interaction 
effects among various sets of covariates. The Hanushek model utilizes a methodology popular 
within econometrics with their fixed effects approach that uses a series of dummy variables to 
account for the time-invariant variables that are not included in the model. They have expanded 
the model to include school-by-year fixed effects as well. The Chicago approach most closely 
resembles more recent proposals within some of the differentiated accountability models and is 


1 For overviews of growth modeling and VAM generally, see the notes for Table 1. 

2 Ohio value-added model (“Battelle for Kids”) information and website: 

http://portal.battelleforkids.org/ohio/home. html?sflang=en ; the Pennsylvania Value-Added Assessment System 
(PVAAS), information, website and realted research: 

http://www.pde.state.pa.us/a and t/cwp/view.asp?A=108&Q=108916 : see also McCaffrey & Hamilton (2007) for a 
study of the earlier implementation of the PVAAS system. 


REL 


Evaluating Teacher Effectiveness 


2 


the most unique of the four models described in Table 1. The Chicago approach deliberately 
attempts to model the gains/growth in student test scores by utilizing large longitudinal datasets 
with vertically aligned/scaled test scores. 


Table 1: Matrix of some value-ac 

ided models 

Model 

Chicago Public 
Schools 

Hanushek et. al 

RAND 

Sanders/EVAAS/TV 

AAS 

Type of Analytical 
Model 

“Productivity profile” 
model w/initial status 
trends (input trends), 
gain trends (learning 
gains), & output trends 
(summing input & 
learning gains) are 
estimated for each 
grade level. 

Fixed-effects model 
w/school & student 
co variates. 

Multilevel 
longitudinal mixed 
model w/student & 
teacher covariates 
(can have cross- 
classification as 
well); the “general 
model” as expanded 
from Sanders. 

Layered mixed- 
effects model. 

Strengths 

Designed around strong 
testing system, 
estimated gain as well 
as learning gap; adjusts 
for school & student- 
level variables. 

Few assumptions 
with fixed effects 
model, very flexible; 
uses multiple 
grades/cohorts to 
remove omitted 
variables bias. 

Does not assume 
prior teacher effects 
are constant and can 
test prior vs. current 
teacher effects; 
accounts for school 
& student effects. 

Uses all available data 
and can estimate 
effects even with 
missing data; does not 
assume linear growth; 
implicitly adjusts for 
prior achievement, 
additive. 

Weaknesses 

Does not estimate gain 
trends as a function of 
initial status; requires 
vertically scaled test 
scores; requires 
expertise to estimate 
models. 

Does not estimate 
effects at 
classroom/teacher 
level, only grade 
level; only provides a 
“lower bound” 
estimate; needs large 
sample of teachers. 

Requires large data 
sets to estimate 
effects; model 
affected by small 
class/student sizes; 
does not estimate 
school effects 
separately. 

Assumes teacher 
effects are cumulative 
and constant over 
time; does not control 
for student covariates 
or teacher effects. 

Select References for 
Specific Model or 
Related 
Specifications 

Bryk et al. (1998); 
Millman (1997); 
Ponisciak & 

Bryk (2005); Thum & 
Bryk (1997) 

Hanushek et al. 
(1998); Kain (1998); 
Rivkin et al. (2005). 
On fixed effects 
models generally, see 
Andrabi et al. (2008); 
Harris & Sass (2006); 
Lockwood & 
McCaffrey (2007a) & 
(2007b); and Todd & 
Wolpin (2003) 

Goldschmidt et al. 
(2005); Heck 
(2009); Hibpshman 
(2004); Lockwood 
et al. (2005); 
Lockwood, 
McCaffrey, 
Mariano, & Setodji 
(2007); McCaffrey 
et al. (2004a) & 
(2004b); and Thum 
(2003a) 

Ballou et al. (2004); 
Lissetz (2005); 
Millman (1997); Ross 
et al. (2003); Sanders 
(2006); Sanders et al. 
(1997); Sanders & 
Rivers (1996); Wright 
& Sanders (2008); and 
Wright et al. (2006). 
Critique of 

Sanders/EVAAS/TV AA 
S\ Amrein-Beardsley 
(2008); Hibpshman 
(2004); Kupermintz 
(2003); McCaffrey et 
al. (2003); see Ballou 
(2004) and Sanders & 
Wright (2008) for 
responses. 


Notes : See Goldschmidt et al. (2005), Hibpshman (2004), McCaffrey et al. (2003), and Tewke et al. (2004) for 
more detailed reviews of the models presented. Overviews of special interest that provide comparisons of the 
various growth/VAM models/specifications and related issues include: Betebenner (2004), (2008), & (2009); Bryk 
et al (1998); Hibpshman (2004); Goldschmidt et al. (2005); Harris & Sass (2006); Linn (2008); Lissetz (2005) & 


REL 


Evaluating Teacher Effectiveness 


3 


(2006); Raudenbush (2004a) & (2004b); Rowan et al. (2002); Seltzer et al. (2003); Tekwe et al. (2004); Thum 
(2003a) & (2003b); and Wright et al. (2006) 


Because the issue of comparing the reliability and validity of various statistical approaches is so 
complex and difficult to understand, we offer below a preliminary list of researchers we came 
across in our reading who might be able to consult on this issue of comparing various statistical 
modeling approaches. 


1. Dale Ballou (Vanderbilt University) 

2. Julian R. Betts (University of California - San Diego & National Bureau of Economic 
Research) 

3. Henry Braun (Boston College & Educational Testing Service) 

4. Anthony Bryk (Stanford University & Consortium on Chicago Schools Research) 

5. Harold Doran (American Institutes for Research) 

6. Laura Goe (Educational Testing Service) 

7. Dan Goldhaber (University of Washington & Center for Analysis of Longitudinal Data in 
Education Research) 

8. Pete Goldschmidt (National Center for Research on Evaluation, Standards, and Student 
Testing [CRESST]) 

9. Bing Han (RAND Corporation) 

10. Douglas Harris (University of Wisconsin - Madison & Wisconsin Center for Education 
Research) 

11. Brian Jacob (University of Michigan & National Bureau of Economic Research) 

12. Thomas Kane (Harvard University) 

13. Cory Koedel (University of Missouri - Columbia) 

14. Spyros Konstantopoulos (Northwestern University) 

15. J.R. Lockwood (RAND Corporation) 

16. Daniel McCaffrey (RAND Corporation) 

17. Robert Meyer (University of Wisconsin - Madison & Wisconsin Center for Education 
Research) 

18. Stephen M. Ponisciak (University of Wisconsin - Madison & Wisconsin Center for 
Education Research) 

19. Steve Raudenbush (University of Chicago) 

20. Steven Rivkin (Amerst College & National Bureau of Economic Research) 

21. Jesse Rothstein (Princeton University & National Bureau of Economic Research) 

22. Tim Sass (Florida State University & Center for Analysis of Longitudinal Data in 
Education Research) 

23. Yeow Meng Thum (Michigan State University) 


REL 


Evaluating Teacher Effectiveness 


4 


Validity of Value-Added Modeling at the Teacher Level 
for Use in Assessing Teacher Effectiveness 

Most experts conclude that growth models for holding schools/districts accountable (even though 
imperfect) are better than status models 3 . However, with the advent of this methodology, more 
uses for value-added modeling (VAM) are being considered. In this section, we summarize the 
concerns raised in several reviews about the validity of valued-added scores when used as the 
sole measure of teacher effectiveness. 

Assessing teacher effectiveness and quality has always proven to be challenging due to the 
complex, dynamic, and multifaceted nature of teaching itself, with each approach presenting its 
own set of strengths and challenges. For example, teacher evaluation ratings by administrators or 
other observers could be skewed toward the high end because they can be influenced by various 
other ongoing professional relationships within a school. In addition, a high level of training of 
the evaluator or observer is required to ensure consistent and reliable use of methods. In contrast, 
achievement scores as a measure of teacher effectiveness might seem more objective and 
straightforward, but there are also factors that reduce the accuracy and fairness of these estimates 
of effectiveness. A March 2009 brief from the National Content Center for Teacher Quality 
entitled Methods of Evaluating Teacher Effectiveness points out that just as there are conditions 
that need to be in place for the effective use of classroom observation measures (e.g., availability 
of a high-quality observation instrument based on standards of effective teaching practice), there 
are also conditions or issues involved in using value-added measures to make judgments 
regarding teacher effectiveness (Goe & Croft, 2009). 

Questions about “validity” are paramount, particularly to the degree that the results are used for 
“high stakes” personnel decisions such as reassignments, providing increased levels of costly 
professional development support for particular teachers, singling out weak teachers for possible 
firing, or providing merit pay/performance incentives to successful teachers. 

Below we articulate some of the validity issues that have been raised about judging teacher 
effectiveness through value-added methods of various kinds. Validity considerations focus on the 
question of “how well the model accurately captures an individual teacher’s contribution to 
student achievement growth in a particular subject area” (Goe, Bell, & Little, 2008, p. 51). 


Possibility of Inappropriate Attributions: Are there factors other than the “teacher” that 
might impact average growth scores across classrooms of students? 

VAM assumes that as a result of the application of the statistical algorithm, the achievement 
gains for a school year for a group of students are due to the teacher. But students are generally 
not randomly assigned to classrooms, so there could be factors related to the nature of the 
assignment process (e.g., that some teachers with seniority tend to get the students they want, 
and other teachers with less seniority might get assigned more difficult students) that might cause 


3 Status models provide a snapshot of student or subgroup proficiency rates compared to a specified benchmark. 
These models are often restricted to one time period (or the average of two time periods). Adequate Yearly Progress 
(AYP) reports are a good example of statistics based upon status models. 


REL 


Evaluating Teacher Effectiveness 


5 


classrooms of students to realize more or less growth in achievement. In some cases, the 
high/low level of growth could be more a reflection of the particular mix of students who are 
assembled in a classroom than of a teacher’s effectiveness. 

For example, Rothstein (2008, 2009a, & 2009b) and Koedel and Betts (2009a) examined data in 
North Carolina and San Diego, respectively, and found effects of student tracking on VAM 
estimates. However, Koedel and Betts found that using three consecutive years of teacher data 
can mitigate the bias introduced by student sorting. Both sets of authors caution against using a 
single year of teacher data to estimate VAMs. 

Braun (2005) advised that in the absence of randomization of students to teachers, causal 
interpretations of high or low growth in the achievement of students to a particular teacher’s 
effectiveness can be misleading. That is, VAM allows for the rank ordering of teachers in term s 
of their effects (which are “the output of a statistical algorithm”), but the rank ordering should 
not be equated with only “teacher effectiveness” due to the other possibilities for influences on 
student achievement. Variations in the quality of school resources or peer effects within a 
classroom are examples of factors other than the “teacher effect” that could affect growth in 
achievement scores differentially. 


Stability of Estimated Effects for Individual Teachers: Are individual teacher effects stable 
over years? 

Braun (2005) describes this as the issue of “precision of estimates.” Braun (p. 10) also indicates 
that each teacher’s ranking might be based on a relatively small number of students, so one or 
two outliers might impact their relative standing from one year to the next: 

Suppose for example that there are a small number of truly disruptive students in a 
cohort. While all teachers may have an equal chance of finding one (or more) of those 
students in their class each year, only a few actually will - with potentially deleterious 
impact on the academic growth of the class in that year. The bottom line is that even if 
teachers and students come together in more or less random ways, estimated teacher 
effects can be quite variable from year to year .... Moreover, the estimates can be quite 
volatile. So treating estimated teacher effects as accurate indicators of teacher 
effectiveness is problematic. 

McCaffrey et al. (2009) used Florida data and found mixed results for the stability of estimated 
VAM teacher effects across both the models used as well as the metric chosen to measure 
student learning. The authors caution: “Adoption of an accountability system based solely on 
value added estimates of teachers’ performance will result in considerable variation in who is 
rewarded across time” (p. 33). 


Assessment Issues: Do variations in the quality or difficulty or degree of vertical alignment 
of state tests from year to year or from one grade level to the next affect the accuracy of the 
value-added scores for individual teachers? 


REL 


Evaluating Teacher Effectiveness 


6 


The value-added approach makes an assumption that tests can be equated from year to year or 
across subjects such that a scale score one year means the same thing the next year. The extent to 
which differences in tests or forms of tests affect value-added scores is another consideration in 
interpreting value-added scores of teachers. Sass (2008) in a study using Florida data reported 
that “. . . it is clear that different tests result in different teacher rankings” (p. 5). 

In one study using four years of longitudinal data for grades 6-8 from one large urban school 
district, Lockwood, McCaffrey, Hamilton et al. (2007) found considerable variation in estimated 
teacher effects with changes in the mathematics assessments. This study found that even 
subscales of the same test, by the same test developer, can yield different results, as can different 
weighting among subscales in a composite score. Although the specific findings from this district 
might not be replicated in other contexts, they provide evidence that inferences based on YAM 
can, at least in some cases, be affected by the characteristics of the outcome measure. These 
findings do suggest reason for caution. Lockwood et al. (2007, p. 61) went on to say: 

Users of VAM must resist the temptation to interpret estimates as pure, stable measures 
of teacher effectiveness. Application of VAM, particularly for high-stakes purposes, 
should be accompanied by an examination of both the test and its alignment with the 
desired curriculum and instructional approach. And to the extent possible, analyses 
should explore the sensitivity of the estimates to different ways of combining information 
from test items. 

Koedel and Betts (2009a) tested for ceiling effects in VAM estimates under a broad set of 
conditions and found VAM estimates to be “negligibly affected” (p. 27). However, when 
working in minimum competency or proficiency-based testing environments, “ceiling conditions 
in such environments can significantly alter VAM estimates of individual teacher effects” (p. 

27). 


Missing Data: How might missing data affect the meaning of a teacher’s effectiveness 
score? 

Teachers with small class sizes, teachers with select subgroups of students (i.e., special needs, 
Title 1 pull-out classes, etc.), or teachers in content/subject areas not assessed can pose unique 
problems for value-added analysis. Imagine a scenario where one teacher’s value-added score is 
based on data from 10 students (roughly 30% of the class) who had all the data points needed for 
the analysis, but another teacher’s rank order is based on data from 20 students (roughly 90% of 
the class). Are their value-added scores equally valid? Braun (2005) indicates that a district 
database can have substantial amounts of missing data. If many teacher/student links or test 
scores are missing, there might be some bias in effect scores generated. Goe and Croft (2009) 
suggest that states/districts should evaluate the accuracy of their data (links of teachers to 
students and extent of missing data) as a precursor to thinking about value-added approaches. 

McCaffrey et al. (2009) caution about using VAM measures for high-stakes decisions for 
teachers with few tested students (for example, teachers with small class sizes, larger numbers of 
students with disabilities, etc.). VAM estimates will tend to over-represent the extremes of the 
distributions “so rewarding or penalizing the top or bottom performers would emphasize these 


REL 


Evaluating Teacher Effectiveness 


7 


teachers and will limit the efficacy of polices designed to identify teachers whose performance is 
truly exceptional” (McCaffrey et al., 2009, p. 32). 


Meaning of the Rank Ordering: Does the rank ordering of teachers on their value-added 
scores correlate with other measures of teacher or teaching quality? 

The validity of teacher effectiveness scores resulting from the application of a value-added 
analysis can also be examined in terms of the relationships of these scores to other data on the 
same teachers. That is, do these estimates of teachers’ impact on student achievement relate to 
other measures of teachers’ performance or teaching quality? A recent research brief by the 
National Content Center for Teacher Quality (Goe & Croft, 2009; see also Goe, 2008) concluded 
that studies to date have been unable to pinpoint or correlate other indicators of teaching quality 
with these valued-added scores for teachers. Thus, value-added analysis may point out teachers 
who are high or low in rankings in terms of their students’ achievement gains. However, what 
that means in terms of teaching strategies, styles, content knowledge, prior experience, and other 
factors is not known. Value-added analysis can help in identifying teachers who seem to obtain 
very high levels of growth from their students. But, it cannot tell you what they are doing to get 
that growth and how much of the growth is due to the teacher versus other factors unique to their 
particular school or group of students. 


Summary : There are many issues with VAM at the teacher/classroom level that can affect its use 
as a single indicator of teacher success or effectiveness. The most common recommendation 
from the research reviewed here is to not use VAM as the sole measure of an individual 
teacher’s effectiveness for high-stakes decisions. 

It is important to point out that although reviews identify some problems of value-added analysis 
for high-stakes personnel decisions, they also point to appropriate uses and possible needs for 
value-added data at the teacher level such as: 

• Identifying teachers who are struggling with student achievement and examining ways to 
help them (if other evidence also indicates problems). 

• Providing feedback to teacher preparation institutions on how their students are doing in 
growing student achievement as they go through the induction process. 

• Identifying schools or particular grade levels that may have higher numbers of teachers 
with low value-added scores to identify teacher assignment issues and explore more 
targeted assistance. 

• Evaluating student outcomes of various programmatic interventions instituted at the 
teacher level (e.g., professional development programs, incentives to move higher-quality 
teachers into lower-achieving schools). 

Examining patterns of student growth by classrooms/teachers, albeit imperfect, is clearly a 
valuable exercise in the continuous improvement of educational services and discussions of 
individual teachers’ effectiveness. However, most reviewers caution against using VAM 
estimates as the sole measure of individual teacher performance, especially when making high- 
stakes decisions regarding teacher compensation and retention/tenure. 


REL 


Evaluating Teacher Effectiveness 


8 


Sample of State and District Programs 


There are very few well-developed systems that evaluate teacher effectiveness that include 
student achievement data (as measured by standardized test scores). Initially, the issue was the 
lack of student and teacher identifiers. However, most states currently have at least a student 
identifier, and many are creating teacher identifiers (see Data Quality Campaign— 
http://www.dataqualitycampaign.org/). Yet, the issues that remain relate to whether states and 
districts want student- and teacher-identified data linked and whether there are state policies that 
dictate how the data can be utilized. 

In this section, we present two tables of established systems with growth models and highlight 
their uses of student achievement data. The first table looks at states and districts that are using 
student achievement data to make compensation decisions (e.g., merit pay). The second table 
focuses on states that require the use of student achievement data (as measured by standardized 
test scores) as a part of teacher evaluations. 

The first table describes examples of a few districts and states that are using student achievement 
data as part of a performance-based compensation initiative. Although VAMs were not required 
as part of the U.S. Department of Education Teacher Incentive Funds (TIF), it appears that most 
states included VAMs in the proposals. 


Table 2: A few states and districts using student achievement 
as part of performance-based compensation initiative 


State/District 

Performance Incentive Program 

Dallas Independent School 
District 

• Created Teacher and Principal Incentive Advisory Council 

• DISD has several programs including: 

o TIF (Principal and Teacher Incentive Fund) 
o DATE (District Awards for Teacher Excellence) 
o Performance Pay Program 

o Texas Educator Excellence Award (Texas Education 
Agency) 

Florida 

Merit Award Program (formerly STAR) 

• Districts to apply to the program 

• Allows awards to be determined by individual or instructional 
team performance 

o 60% - Student learning gains, basically student achievement 
as measured by standardized tests 
o 40% - Principal/supervisor evaluation 

Minnesota 

Quality Compensation for Teachers (Q-Comp) 

• Started in July 2005 

• Provides funding for teacher-compensation systems 

• Established rigorous standards for measuring student 
achievement & teacher quality 


REL 


Evaluating Teacher Effectiveness 


9 


As noted earlier, using student achievement to measure teacher effectiveness is only one of 
several methods currently used and when included it does not necessarily refer to standardized 
test scores. In fact, the use of student achievement data is probably the least common teacher 
evaluation tool because of state policies regarding the use of such data to evaluate individual 
teachers (e.g. Minnesota, Pennsylvania, Tennessee, Utah, etc.) and because of underdeveloped 
longitudinal data systems (e.g. Idaho, Indiana, Maine, Missouri, Montana, etc.). Therefore, 
student growth is typically measured through teacher-developed assessments, benchmark 
assessments, student work, portfolios, and lesson plans. 

In those states that are using VAMs, the data are typically used to assess what is happening at the 
building level or to develop professional development plans (i.e., Pennsylvania). The data are 
also used to calculate the number of Highly Qualified Teachers (HQT). Overall, the states leave 
the utilization of the data up to the district (e.g., Pennsylvania, Tennessee, Utah, Wisconsin). 

• In Tennessee (Tennessee Value-Added Assessment System -TVAAS), part of the state’s 
agreement with teachers was that the data would be confidential between the teachers and 
their principal and not used to evaluate individual teachers. The evaluation of teachers is 
up to the school districts. 

• In Wisconsin, it is against state law to use student test results to evaluate teacher 
performance, discipline teachers, or use it as a reason not to renew their contracts 
(Wisconsin Statue 118.30[2]4[c]). 

• Utah (U-Pass Accountability Plan) leaves it up to the districts to decide if they will 
analyze the data linking student achievement to individual teachers. 

Some states, like the West Virginia Department of Education, have chosen not to link student 
and teacher identifiers because they do not believe there is enough research on growth methods. 


Table 3. Two states requiring use of student achievement data (as measured by 
standardized test scores) as part of teacher evaluations 


State 

Used for Teacher Evaluations 

Florida 

• District Performance Appraisal Checklist 

• 1012.34, Florida Statutes, requires that assessment procedures 
for instructional personnel and school administrators be based on 
the performance of students assigned to their classrooms or 
schools, as appropriate. Student performance must be measured 
by the required state assessments and local assessments for 
subjects and grade levels not measured by the state. (FLDOE 
Website, http://www.fldoe.org/profdev/pa.asp) 

Louisiana 

• Testing Value Added Teacher Preparation Program Assessment 
Model 

• Only for novice teachers 

• Grades 4-9 

• Has the “capacity to examine the growth of achievement of 
children and link growth in student learning to teacher 
preparation programs 


REL 


Evaluating Teacher Effectiveness 


10 


The primary use for value-added modeling has been as an approach to school and district 
accountability. That is, growth models of student achievement are perceived as an improvement 
over cohort or status models that just report the percent of students achieving at various levels 
each year. However, as more experience with growth models has developed, other uses beyond 
accountability-reporting at the school and district level are being explored. For example, there 
are states that are exploring VAM scores as a way of evaluating/comparing teacher-preparation 
institutions (see Louisiana above). In addition, there are some that are using this information on 
individual teachers in conjunction with a merit pay initiative (e.g., U.S. Department of Education 
TIF, state created, etc.). Most districts have chosen not to use the linked data to evaluate teacher 
performance. 

What about challenges? 

None of the program descriptions pointed to specific challenges in implementation, except for 
Florida and Denver’s ProComp initiative (teacher compensation). However, there were several 
reports that discussed lessons learned in developing a performance-compensation system that 
linked student and teacher records (Bergner, Steiny, & Armstrong, 2007; National Institute for 
Excellence in Teaching, July 2007). These lessons learned are summarized here: 

1. Stakeholder involvement. Involve stakeholders from conception through 
implementation. Buy-in is key to linking teacher and student identifiers and then using 
that data as part of a teacher-evaluation system. Although the literature focuses on non- 
SEA stakeholders, it is also important that all education agencies/departments that collect 
data collaborate. While some districts have voted on using value-added measures as part 
of teacher evaluation or compensation systems (similar to comprehensive school reform 
recommendations), other educational systems have given teachers the option to 
participate. Still, there are some initiatives that only involve specific schools and/or 
districts (e.g., Amphitheater Unified School District [AZ], Benwood Initiative [TN], 
Guilford County [NC], etc.). Delaware held focus groups with stakeholders, while 
Washington created advisory councils. Regardless of how buy-in is obtained teachers, 
administrators, legislatures, parents, and unions should have an opportunity to share their 
concerns. 

2. Be transparent. If you are transparent about how you will go about building the system 
and how the data will be used, then fears can be calmed. Transparency and 
communication are one and the same; information about the value-added model should 
be shared with stakeholders through various mediums and with links to relevant research 
reports and other useful materials. You have to be explicit about why this system is the 
best option for student and staff growth. 

3. Ensure confidentiality and security of individual records. Early teacher identification 
numbers were social security numbers, and that has changed. FERPA ensures that 
teachers’ and students’ privacy is protected, and therefore the system developers should 
pay special attention to protecting them when building the system and creating reports. 
Badolato (2007) states that there needs to be a checks-and-balances system that ensures 
privacy and guides use (p. 12). 

4. Provide training to use the system. This is an extension of stakeholder involvement. 
Whether the system is SEA- or LEA-based, administrators and teachers should be trained 
how to use the system and generate reports that meet their needs. For example, Battelle 


REL 


Evaluating Teacher Effectiveness 


11 


for Kids and the Ohio Department of Education provide ongoing training on how to use 
value-added information. 

The original question was, “What are some effective or well-developed systems to evaluate 
teacher effectiveness that include student-learning impact data?” Through our search we did not 
find well-developed systems evaluating teacher effectiveness through state achievement data. 
But, we found evidence that some states provide instruction on how to include student-learning 
measures in teacher evaluation through other sources. And we found that there were multiple 
examples of using value-added models as part of performance-pay programs. Overall, most 
states and districts advocate the use of some sort of student-learning measure in teacher 
evaluations, but in general they do not use a value added model as part of their teacher- 
evaluation system. 


V AM-Related Resources 


Organizations 

• 2004 CCS SO Brain Trust on Use of Growth Models Based on Student-Level Data in 
School Accountability Conference : November 15-16, 2004, at the Holiday Inn on the 
Hill, Washington, DC; conference agenda, session PowerPoint presentations, and notes 
are available here: http://www.ccsso.org/proiects/Accountability Systems/5508.cfm 

• 2004 Journal of Educational and Behavioral Statistics (Vol. 29, No. 1 ): Entire issue 
dedicated to VAM with contributions by Ballou, Lockwood, McCaffrey, Raudenbush, 
Rubin, Sanders, etc. 

• Data Quality Campaign , http://www.dataqualitycampaign.org/ 

• Education Value-Added Assessment System (EVAAS): Bill Sanders’ model for North 
Carolina and extension of the TYAAS model; several papers are archived at the SAS site: 
http://www.dpi.state.nc.us/evaas/ 

http ://w w w . sas .com/ govedu/ edu/services/ effectiveness .html 

• National Center on Performance Incentives (NCPI): The NCPI at Vanderbilt 
University’s Peabody College with funding from ED/IES for teacher performance 
incentive research and several working papers on VAM: 

http ://w w w .performanceincenti ves . or g/index . asp 

• National Content Center for Teacher Quality (TQ Center ): The regional comprehensive 
center on teacher quality jointly run by ETS, Learning Point Associates, and Vanderbilt 
University: http://www.tqsource.org/ 

• National Conference on Value-Added Modeling : April 22-24, 2008, at the University of 
Wisconsin at Madison; all session papers and select PowerPoint presentations are located 
here: http://www.wcer.wisc.edu/news/events/natConf papers.php 

• National Council on Teacher Quality : htt p://www.nctq.Org/p / 

• National Institute for Excellence in Teaching ( NIE T )/T eacher Advancement Program 
(TAP): The center, originally developed by the Milken Family Foundation and now 


REL 


Evaluating Teacher Effectiveness 


12 


operated by NIET, coordinates TAP and conducts various teacher-quality studies: 
http://www.talentedteachers.org/ 

• Pennsylvania Value-Added Assessment System ( PVAAS ): 
http://www.pde.state.pa.us/a and t/cwp/view.asp?A=108&0=108916 

• Teacher Quality Research ( TQR ): Doug Harris and Tim Sass’ website for their IES- 
funded teacher effects research including VAM studies: 
http://www.teacherqualitvresearch.org/ 

• Value-Added Conference at the University of Maryland: October 21-22, 2004, with 
presentations by Alban, Schatz, & Von Seeker; Ballou; Braun; Cunningham & Stone; 
Doran & Cohen; McCaffrey; Bryk & Ponisciak; Stevens; and Schmidt, Houang, & 
McKnight; session PowerPoint slides are available as well as some papers: 
http://www.education.umd.edu/EDMS/MARCES/conference/value added/ ; and 
http://www.education.umd.edu/EDMS/MARCES/conference/value added/valueadd.htm 

• Value-Added Measures: Implications for Policy and Practice Conference: May 23, 
2008, at the Urban Institute, Washington, DC; all session PowerPoints are available as 
well as mp3 audio: http://www.caldercenter.org/events/valueadded.cfm 

• Value-Added Research Center (VARC): http://varc.wceruw.org/ 

• Wisconsin Center for Education Research(WCER): http://www.wceruw.org/index.php 


REL 


Evaluating Teacher Effectiveness 


13 


Resources 

AERA (2004). Teachers matter: Evidence from value-added assessments. Research 
Points , 2(2). Published by the American Educational Research Association. 

Amrein-Beardsley, A. (2008). Methodological concerns about the education value-added 
assessment system. Educational Researcher , 37(2), 65-75. 

Andrabi, T., Das, J., Khwaja, A.I., & Zajonc, T. (2008). Do value-added estimates add value? 
Accounting for learning dynamics (Working Paper No. 158). Cambridge, MA: Center for 
International Development, Harvard University. Retrieved June 19, 2009, from 
http://www.cid.harvard.edu/cidwp/pdf/158.pdf 

Badolato, V. (2007). Addressing the need for better data on teaching in Colorado - Unique 
Teacher Identifier: Stakeholder Process Report. Denver, CO: The Alliance for Quality 
Teaching. 

Ballou, D. (2004). Rejoinder. Journal of Educational and Behavioral Statistics, 29(1), 

131-134. 

Ballou, D. (2008). Test scaling and value-added measurement (Working Paper 2008-23). 
Nashville, TN: National Center on Performance Incentives. Retrieved June 19, 2009, from 
http://www.performanceincentives.org/data/files/news/PapersNews/ 

BALLOU 2008REV2 FINAL.pdf 

Ballou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added 
assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37-65. 

Bergner, T., Steiny, J., & Armstrong, J. (2007). Benefits of and lessons learned from linking 
teacher and student data. Austin, TX: Data Quality Campaign. 

Betebenner, D. (2004). An analysis of school district data using value-added methodology (CSE 
Report 622). Los Angeles: National Center for Research on Evaluation, Standards, and 
Student Testing (CRESST), University of California, Los Angeles. Retrieved June 19, 2009, 
from http://www.cse.ucla.edu/products/Reports/ R622.pdf 

Betebenner, D.W. (2008). Norm-and criterion-referenced student growth. Dover, NH: Center for 
Assessment/National Center for the Improvement of Educational Assessment. Retrieved 
March 26, 2009, from http://www.nciea.org/publications/ 
normative criterion growth DB08.pdf 

Betebenner, D.W. (2009). Growth, standards and accountability. Dover, NH: Center for 

Assessment/National Center for the Improvement of Educational Assessment. Retrieved June 
23, 2009, from http://www.nciea.org/publications/growthand Standard_DB09.pdf 

Braun, H.I. (2005). Using student progress to evaluate teachers: A primer on value- 
added models. Educational Testing Service (ETS), September 2005. 


REL 


Evaluating Teacher Effectiveness 


14 


Braun, H., Qu, Y., & Trapani, C. (2008). Robustness of a value-added assessment of school 
effectiveness (RR-08-22). Princeton, NJ: Educational Testing Service. 

Bryk, A.S., Thum, Y.M., Easton, J.Q., & Luppescu, S. (1998). Academic productivity of Chicago 
Public Elementary Schools (Technical Report). Chicago: Consortium on Chicago School 
Research. Retrieved June 19, 2009, from http://ccsr.uchicago.edu/ publications/pOaOlO.pdf 

Goe, L. (2008). Key issue: Using value-added models to identify and support highly effective 
teachers. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved 
April 24, 2009, from http://www2.tqsource.org/strategies/het/ UsingValueAddedModels.pdf 

Goe, L., & Croft, A. (2009). Methods of evaluating teacher effectiveness. Washington, 

DC: National Comprehensive Center for Teacher Quality. 

Goe, L, Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A 

research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. 

Goldhaber, D., & Hansen, M. (2008a). Is it just a bad class? Assessing the stability of 
measured teacher performance (Working Paper #2008-5). Seattle, WA: Center on 
Reinventing Public Education. Retrieved April 17, 2009, from http://www.crpe. 
org/cs/crpe/download/csr files/wp crpe5 badclass nov08.pdf 

Goldhaber, D., & Hansen, M. (2008b). Assessing the potential of using value added- 
estimates of teacher job performance for making tenure decisions (Policy Brief 3). 
Washington, DC: National Center for Analysis of Longitudinal Data in Education Research. 

Goldschmidt, P., Roschewski, P., Choi, K., Auty, W., Hebbler, S., Blank, R., & Williams, A. 
(2005). Policymakers ’ guide to growth models for school accountability: How do 
accountability models differ? Washington, DC: Council of Chief State School Officers. 
Retrieved June 17, 2009, from http://www.ccsso.org/publications/ 
details.cfm?PublicationID=287 


Hanushek, E.A., Kain, J.F., & Rivkin S.G. (1998). Teachers, schools, and academic achievement 
(Working Paper No. 6691). Cambridge, MA: National Bureau of Economic Research. 
http://libproxy.uncg.edu:2790/papers/w6691.pdf 

Harris, D., & Sass, T.R. (2006). Value-added models and the measurement of teacher 
quality. Unpublished. Tallahassee, FL: Florida State University. 
http://mailer.fsu.edu/~tsass/Papers/IES%20Harris%20Sass%20EPF%20Value- 
added%2014.pdf 

Harris, D.N., & Sass, T.R. (2007). What makes for a good teacher and who can tell? 
Unpublished. Tallahassee, FL: Florida State University, http://mailer.fsu.edu/~tsass/ 
Papers/IES%20Principal%20Eval%2017B.pdf 

Heck, R.H. (2009). Teacher effectiveness and student achievement: Investigating a multilevel 
cross-classified model. Journal of Educational Administration, 47(2), 227-249. 


REL 


Evaluating Teacher Effectiveness 


15 


Hibpshman, T. (2004). A review of value-added models. Frankfort, KY: Kentucky Education 
Professional Standards Board. Retrieved June 17, 2009, from 

http://www.kvepsb.net/documents/Stats/Journals/Heterogeneity%20of%20regression.pdf 

Hong, G., & Raudenbush, S. W. (2008). Causal inference for time-varying instructional 
treatments. Journal of Educational and Behavioral Statistics, 33(3), 333-362. 

Jacob, B.A., & Lefgren, L. (2006). Principals as agents: Subjective performance 

measurement in education (Working Paper No. 1 1463). Cambridge, MA: National Bureau of 
Economic Research, http://libproxy.uncg.edu: 2790/papers/wl 1463.pdf 

Jacob, B.A., & Lefgren, L. (2008). Can principals identify effective teachers? Evidence 

on subjective performance evaluation in education. Journal of Labor Economics, 26(1), 101- 
136. 

Jacob, B.A., Lefgren, L., & Sims, D. (2008). The persistence of teacher-induced learning gains 
(Working Paper No. 14065). Cambridge, MA: National Bureau of Economic Research. 
http://libproxy.uncg.edu:2790/papers/wl4065 

Kain, J.F. (1998). The impact of individual teachers and peers on individual student 

achievement. Paper presented at the Association for Public Policy Analysis and Management 
(APPAM) 20th Annual Research Conference, New York, NY. Retrieved June 19, 2009, from 
http://www.mccsc.edu/~curriculum/Impact%20of% 20teachers%20on%20achievement.pdf 

Kane, T.J., & Staiger, D.O. (2008). Estimating teacher impacts on student achievement: An 
experimental evaluation (Working Paper No. 14607). Cambridge, MA: National Bureau of 
Economic Research. http://libproxy.uncg.edu:2790/papers/wl4607.pdf 

Koedel, C. (In press). An empirical analysis of teacher spillover effects in secondary 
school. Economics of Education Review. 

Koedel, C., & Betts, J. (2009a). Value-added to what? How a ceiling in the testing 

instrument influences value-added estimation (Working Paper No. 14778). Cambridge, MA: 
National Bureau of Economic Research. http://libproxy.uncg.edu:3557/papers/ w!4778 

Koedel, C., & Betts, J. (2009b). Does student sorting invalidate value-added models of 

teacher effectiveness? An extended analysis of the Rothstein critique (Working Paper 09-02). 
Columbia, MO: Department of Economics, University of Missouri. Retrieved April 21, 2009, 
from http://economics.missouri.edu/working-papers/2009/WP0902 koedel.pdf 

Kupermintz, H. (2003). Teacher effects and teacher effectiveness: A validity investigation of the 
Tennessee Value Added Assessment System. Educational Evaluation and Policy Analysis, 
25(3), 287-298. 

Linn, R.L. (2008). Methodological issues in achieving school accountability. Journal of 
Curriculum Studies, 40(6), 699-711. 


REL 


Evaluating Teacher Effectiveness 


16 


Lissetz, R.W. (ed.; 2005). Value added models in education: Theory and applications. Maple 
Grove, MN: JAM Press. 

Lissetz, R.W. (ed.; 2006). Longitudinal and value added models of student performance. Maple 
Grove, MN: JAM Press. 

Little, O., Goe, L., & Bell, C. (2009). A Practical Guide to evaluating teacher 

effectiveness. Washington, DC: National Comprehensive Center for Teacher Quality. 

Lockwood, J.R., Le, V.N., Stecher, B.M., & Hamilton, L.S. (2005). A value-added modeling 
approach for examining the relationship between reform teaching and mathematics 
achievement (Working Paper WR-262-EDU). Santa Monica, CA: RAND Corporation. 
http://www.rand.org/pubs/working papers/2005/RAND WR262.pdf 

Lockwood, J.R., & McCaffrey, D.F. (2007a). Controlling for student heterogeneity in 

longitudinal achievement models (Working Paper WR-471-IES). Santa Monica, CA: RAND 
Corporation, http://www.rand.org/pubs/working papers/2007/RAND WR471.pdf 

Lockwood, J.R., & McCaffrey, D.F. (2007b). Controlling for individual heterogeneity in 
longitudinal models, with applications to student achievement. Electronic Journal of 
Statistics, /(electronic), 223-252. 

Lockwood, J.R., McCaffrey, D.F., Hamilton, L.S., Stecher, B., Le, V. N., & Martinez, 

J.F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics 
achievement measures. Journal of Educational Measurement, 44(1), 47-67 . 

Lockwood, J.R., McCaffrey, D.F., Mariano, L.T., & Setodji, C. (2007). Bayesian methods for 
scalable multivariate value-added assessment. Journal of Educational and Behavioral 
Statistics, 32(2), 125-150. 

Martineau, J. A. (2006). Distorting value added: The use of longitudinal, vertically scaled 
student achievement data for growth-based, value-added accountability. Journal of 
Educational and Behavioral Statistics, 31(1), 35-62. 

Mathers, C., Oliva, M., & Laine, S.W.B. (2008, February). Improving instruction through 
effective teacher evaluation: Options for states and districts. Washington, DC: National 
Comprehensive Center for Teacher Quality. 

McCaffrey, D.F., & Hamilton, L.S. (2007). Value-added assessment in practice: Lessons from 
the Pennsylvania Value-Added Assessment System Pilot Project (Technical Report TR-506- 
CC). Santa Monica, CA: RAND Corporation. 
http://www.rand.org/pubs/technical_reports/2007/RAND_TR506.pdf 

McCaffrey, D.F., Koretz, D.M., Lockwood, J.R., & Hamilton, L.S. (2004). The 

promise and peril of using value-added modeling to measure teacher effectiveness (Research 
Brief RB-9050-EDU). Santa Monica, CA: RAND Corporation. 
http://www.rand.org/pubs/research briefs/2005/RAND RB9050.pdf 


REL 


Evaluating Teacher Effectiveness 


17 


McCaffrey, D.F., Lockwood, J.R., Koretz, D.M., & Hamilton, L.S. (2003). Evaluating value 
added models for teacher accountability. Santa Monica, CA: RAND Corporation. 
http://www.rand.org/pubs/monographs/2004/RAND_MG158.pdf 

McCaffrey, D.F., Lockwood, J.R., Koretz, D., Louis, T.A., & Hamilton, L. (2004a). Models for 
value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 
29(1), 67-101. 

McCaffrey, D.F., Lockwood, J.R., Koretz, D., Louis, T.A., & Hamilton, L. (2004b). Let’s see 
more empirical studies on value-added modeling of teacher effects: A reply to Raudenbush, 
Rubin, Stuart and Zanutto, and Reckase. Journal of Educational and Behavioral Statistics, 
29(1), 139-143. 

McCaffrey, D.F., Sass, T.R., Lockwood, J.R., & Mihaly, K. (2009). The inter-temporal 
variability of teacher effect estimates. Unpublished. Tallahassee, FL: Florida State 
University. http://mailer.fsu.edu/~tsass/Papers/MS2804 1 -20feb2009 1 .pdf 

Millman, J. (ed; 1997). Grading teachers, grading schools: Is student achievement a valid 
evaluation measure ? Thousand Oaks, CA: Corwin Press. 

National Association of State Boards of Education (2005). Evaluating value-added: 

Findings and recommendations from the NASBE Study Group on value-added assessments. 
Alexandria, VA: Author, http://www.nasbe.org/bookstore/product/ path/22/product id/13 

National Institute for Excellence in Teaching, Working Group on Teacher Quality (2007, July). 
Creating a successful performance compensation system for educators. Washington, DC: 
National Institute for Excellence in Teaching. 

National Institute for Excellence in Teaching (2008). Roundtable discussion on value- 

added analysis of student achievement: A summary of findings. Santa Monica, CA: Author. 
Retrieved June 17, 2009, from http://www.talentedteachers.org/pubs/value 
added roundtable 08.pdf 

Ponisciak, S., & Bryk, A.S. (2005). Value-added analysis of the Chicago Public Schools: An 
application of hierarchical models. In R.W. Lissetz (ed.), Value added models in education: 
Theory and applications (pp. 40-79). Maple Grove, MN: JAM Press. 

Raudenbush, S.W. (2004a). What are value-added models estimating and what does this imply 
for statistical practice? Journal of Educational and Behavioral Statistics, 29(1), 121-129. 

Raudenbush, S.W. (2004b). Schooling, statistics, and poverty: Can we measure school 

improvement? Princeton, NJ: Educational Testing Service; Policy, Evaluation and Research 
Center. http://www.etsliteracy.net/Media/Education_Topics/pdf/angoff9.pdf 

Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). Teachers, schools, and academic 
achievement. Econometrica, 75(2), 417-458. 


REL 


Evaluating Teacher Effectiveness 


18 


Ross, S.M., Stringfield, S., Sanders, W.L., & Wright, S.P. (2003). Inside systemic elementary 
school reform: Teacher effects and teacher mobility. School Effectiveness and School 
Improvement , 14(1), 73-110. 

Rothstein, J. (2008). Teacher quality in educational production: Tracking, decay, and 
student achievement (Working Paper No. 14442). Cambridge, MA: National Bureau of 
Economic Research. http://libproxy.uncg.edu:2790/papers/wl4442.pdf 

Rothstein, J. (2009a). Student sorting and bias in value added estimation: Selection on 
observables and unobservables (Working Paper No. 14666). Cambridge, MA: National 
Bureau of Economic Research. http://libproxy.uncg.edu:2790/papers/w 14666.pdf 

Rothstein, J. (2009b). Teacher quality in educational production: Tracking, decay, and student 
achievement (Working Paper). Princeton, NJ: Princeton University Department of 
Economics. Retrieved June 10, 2009, from http://www.princeton. 
edu/~irothst/published/rothstein vam mavl52009.pdf 

Rowan, B., Correnti, R., & Miller, R. (2002). What large-scale survey research tells us about 
teacher effects on student achievement: Insights from the Prospects study of elementary 
schools. Teachers College Record, 704(8), 1525-1567. 

Rubin, D.B., Stuart, E.A., & Zanutto, E.L. (2004). A potential outcomes view of value-added 
assessment in education. Journal of Educational and Behavioral Statistics, 29(1), 103-1 16. 

Sanders, W.L. (2006). Comparisons among various educational assessment value-added models. 
Paper presented at The Power of Two — National Value-Added Conference, Columbus, OH. 
Retrieved May 18, 2009, from http://www.sas.com/govedu/ 
edu/services/vaconferencepaper.pdf 

Sanders, W.L., & Rivers, J.C. (1996). Cumulative and residual effects of teachers on 

future student academic achievement. University of Tennessee, Value-Added Research and 
Assessment Center. 

Sanders, W.L., Saxton, A.M., & Horn, S.P. (1997). The Tennessee Value-Added Assessment 
System: A quantitative outcomes-based approach to educational assessment. In J. Millman 
(ed.). Grading teachers, grading schools: Is student achievement a valid educational 
measure? (pp. 137-162). Thousand Oaks, CA: Corwin Press. Retrieved June 17, 2009, from 
http://www.sas.com/govedu/ edu/sanderssaxtonhorn.pdf 

Sanders, W.L., & Wright, S.P. (2008). A response to Amrein-Beardsley (2008) “Methodological 
concerns about the education value-added assessment system.” Unpublished. Cary, NC: SAS 
Institute Inc. Retrieved June 19, 2009, from 

http://www.sas.com/govedu/edu/services/Sanders Wright response to Amrein- 
Beardsley 4 14 2008.pdf 


REL 


Evaluating Teacher Effectiveness 


19 


Sass, T.R. (2008). The stability of value-added measures of teacher quality and 

implications for teacher compensation policy (Policy Brief 4). Washington, DC: National 
Center for Analysis of Longitudinal Data in Education Research. Retrieved April 17, 2009, 
from http://www.urban.org/uploadedpdf/1001266 stabilityofvalue.pdf 

Schug, M.C., & Niederjohn, M.S. (2009, May). Value added testing: Improving state 
testing and teacher compensation in Wisconsin. WPRI Reports, 22(4). Hartland, WI: 
Wisconsin Policy Research Institute. 

Seltzer, M., Choi, K., & Thum, Y.M. (2003). Examining relationships between where students 
start and how rapidly they progress: Using new developments in growth modeling to gain 
insight into the distribution of achievement within schools. Educational Evaluation and 
Policy Analysis , 25(3), 263-286. 

Tekwe, C.D., Carter, R.L., Ma, C-X, Algina, J., Lucas, M.E., Roth, J., Ariet, M., Fisher, T., & 
Resnick, M.B. (2004). An empirical comparison of statistical models for value-added 
assessment of school performance. Journal of Educational and Behavioral Statistics, 29(1), 
11-36. 


Thum, Y.M. (2003a). Measuring progress towards a goal: Estimating teacher productivity using 
a multivariate multilevel model for value-added analysis. Sociological Methods and 
Research, 32(2), 153-207. 

Thum, Y.M. (2003b). No Child Left Behind: Methodological challenges & recommendations for 
measuring adequate yearly progress (CSE Technical Report 590). Los Angeles: National 
Center for the Research on Evaluation, Standards, and Student Testing (CRESST), 

University of California, Los Angeles. Retrieved June 17, 2009, from 
http://www.cse.ucla.edu/products/Reports/TR590.pdf 

Thum, Y.M., & Bryk, A.S. (1997). Value-added productivity indicators. In J. Millman (ed.), 
Grading teachers, grading schools: Is student achievement a valid evaluation measure? (pp. 
137-162). Thousand Oaks, CA: Corwin Press. 

Toch, T., & Rothman, R. (2008). Rush to judgment: Teacher evaluation in public 
education. Washington, DC: Education Sector. 

Todd, P.E., & Wolpin, K.I. (2003). On the specification and estimation of the production 
function for cognitive achievement. Economic Journal, 773(485), F3-F33. 

Webster, W.J., & Mendro, R.L. (1997). The Dallas value-added accountability system. In J. 
Millman (ed.). Grading teachers, grading schools: Is student achievement a valid evaluation 
measure? (pp. 81-99). Thousand Oaks, CA: Corwin Press. Retrieved June 19, 2009, from 
http://www.dallasisd.org/eval/research/articles/Webster-Dallas-Value-Added-Accountability- 
System.pdf 

Wright, S.P., & Sanders, W.L. (2008). Decomposition of estimates in a layered value-added 
assessment model. Paper presented at the National Conference on Value-Added Modeling, 


REL 


Evaluating Teacher Effectiveness 


20 


University of Wisconsin-Madison. Retrieved May 11, 2009, from 
http://www.wcer.wisc.edu/news/events/WrightSanders Decomposition.pdf 

Wright, S.P., Sanders, W.L., & Rivers, J.C. (2006). Measurement of academic growth of 
individual students toward variable and meaningful academic standards. In R.W. Lissetz 
(ed.). Longitudinal and value added models of student performance (pp. 385-406). Maple 
Grove, MN: JAM Press. Retrieved June 19, 2009, from http://www.sas. 
com/go vedu/ edu/wri ghtsandersri vers .pdf 


Methodology 

In order to answer this request, we looked in Wilson Web (UNCG Education Database) and 
ERIC. In addition, we also searched Google using the phrases “value-added,” “growth models,” 
“teacher evaluation,” “student achievement,” and “performance pay.” We also searched the 
websites of the following organizations: U.S. Department of Education, Institute of Education 
Sciences, National Content Center for Teacher Quality, Education Commission of the States, 
Council of Chief State School Officers (CCSSO), National Governors Association, the 
Wisconsin Policy Research Institute, Center for Teaching Quality, National Center on 
Performance Incentives, Wisconsin Center for Education Research, and National Bureau of 
Economic Research. 


REL 


Evaluating Teacher Effectiveness 


21 


REGIONAL EDUCATIONAL LAB August 2009, EBE # 500 




We provide research based information on 
educational initiatives happening nationally and 
regionally. The EBE Request Desk is currently taking 
requests for: 

- Research on a particular topic 

- Information on the evidence base for curriculum 
interventions or 

professional development programs 

- Information on large, sponsored research projects 

- Information on southeastern state policies and 
programs 

For more information or to make a request, contact: 
Karla Lewis 
1.800.755.3277 
klewis@serve.org 


The Regional Educational Laboratory (REL) - Southeast's Evidence Based Education (EBE) Request Desk is a service provided by a 
collaborative of the REL program, funded by the U.S. Department of Education's Institute of Education Sciences (IES). This response 
was prepared under a contract with IES, Contract ED-06-CO-0028, by REL-Southeast administered by the SERVE Center at the 
University of North Carolina at Greensboro. The content of the response does not necessarily reflect the views or policies of IES or the 
U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. 
Government. 


REL 


Evaluating Teacher Effectiveness 


22 






